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ABSTRACT 

We suggest a new algorithm to remove systematic efFects in a large set of lightcurves 
obtained by a photometric survey. The algorithm can remove systematic effects, like 
the ones associated with atmospheric extinction, detector efficiency, or PSF changes 
over the detector. The algorithm works without any prior knowledge of the effects, 
as long as they linearly appear in many stars of the sample. The approach, which 
was originally developed to remove atmospheric extinction effects, is based on a lower 
rank approximation of matrices, an approach which was already suggested and used 
in chemometrics, for example. The proposed algorithm is specially useful in cases 
where the uncertainties of the measurements are unequal. For equal uncertainties the 
algorithm reduces to the Principal Components Analysis (PCA) algorithm. We present 
a simulation to demonstrate the effectiveness of the proposed algorithm and point out 
its potential, in search for transit candidates in particular. 
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1 INTRODUCTION 

The advent of large high-S/A'^ CCDs for the use of astro- 
nomical studies has driven many photometric projects that 
have already produced unprecedentedly large sets of accu- 
rate stellar lightcurves for various astronomical studies. An 
example of such a project is the OGLE search for transit can- 
didates, which has already yielded significant results (e.g., 
lUdalski et aIll2002D . Searching for low-amplitude variables, 
such as planetary transits, involves finding a faint signal in 
noisy data. It is therefore of prime interest to remove any 
systematic effects hiding in the data. 

Systematic observational effects may be associated, for 
example, with the varying atmospheric conditions, the vari- 
ability of the detector efficiency or PSF changes. However, 
these effects might vary from star to star, depending on the 
stellar colour or the position of the star on the CCD, a de- 
pendence which is not always known. Therefore, the removal 
of such effects might not be trivial. 

We present here an algorithm to remove some of the 
systematic effects in a large set of lightcurves, without any 
a priori knowledge of the different observational features 
that might affect the measurements. The algorithm finds the 
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systematics and their manifestation in the individual stars, 
as long as these effects appear in many lightcurves. 

We started the development of our algorithm in an at- 
tempt to correct for the atmospheric extinction, with a n 
approach similar to that of lKruszewski fc Semeniu^ i2003l) . 
We derived the best-fitting airmasses of the different images 
and the extinction coeflicients of the different stars, with- 
out having any information on the stellar colours. However, 
the result is a general algorithm that deals with linear sys- 
tematic effects. It turne d out that such a n algo rithm had 
already been proposed bv lGabriel fc Zaini BiEItI), who had 
applied it to data from disciplines other than astronomy, 
chemometrics, for example. In some restricted cases, when 
one can ignore the different uncertainties of the data points, 
this algorithm reduce s to the well-known Pri ncipal Compo- 
nent Analysis (PCA: iMurtagh fc HeclJllQSl Ch. 2). How- 
ever, when the uncertainties of the measurements vary sub- 
stantially, like in many photometric surveys, PCA performs 
poorly, as we demonstrate below. 

Section |5| presents the initial, simpler version of our 
algorithm, which was meant solely to remove the colour- 
dependent atmospheric extinction. Section Inputs the algo- 
rithm in a broader context, and shows how the algorithm can 
remove linear systematic effects, and can even treat several 
unknown effects. Section |4] presents a simulation to demon- 
strate the effectiveness of our algorithm. We discuss some 
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of the algorithm properties and potential developments in 
Section 13 



2 CORRECTION FOR ATMOSPHERIC 
EXTINCTION 

The colour-dependent atmospheric extinction is an obvious 
observational effect that contaminates ground-based pho- 
tometric measurements. This effect depends on the stellar 
colours, which are not always completely known. This is 
specially true for photometric surveys when only one fil- 
ter is used and no explicit colour information is available. 
This section describes how we find the best stellar extinc- 
tion coefficients to account for the atmospheric absorption, 
together with the most suitable airmasses assigned for each 
image. 

Consider a set of A*' lightcurves, each of which consists 
of M measurements. We define the residual of each obser- 
vation, Tij, to be the average-subtracted stellar magnitude, 
i.e., the stellar magnitude after subtracting the average mag- 
nitude of the individual star. 

Let {oj ;j = 1,...,M} be the airmass at which the 
j-th image was observed. We can then define the effective 
extinction coefficient Ci of star i to be the slope of the best 
linear fit for the residuals of this star - {vij ;] — 1, ...,M} 
as a function of the corresponding airmasses - {aj ;j = 
1, ...,M}. We aim to remove the product CiUj from each 
rij. In fact, we search for the best Ci that minimizes the 
expression 
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(1) 



where aij is the uncertainty of the measurement of star i in 
the image j. 

Assuming the airmasses are known, a simple differenti- 
ation and equating to zero yields an estimate for the extinc- 
tion coefficient: 
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We can now recalculate the best-fitting coefficients, c^^\ and 
continue iteratively. We thus have an iterative process which 
in essence searches for the two sets - {ci} and {flj}, that best 
account for the atmospheric extinction. 

We performed many simulations that have shown that 
this iterative process converged to the same {(ij} and {ci}, 
no matter what initial values were used. Therefore, we sug- 
gest that the proposed algorithm can find the most suitable 
airmass of each image and the extinction coefHcient of each 
star. As the next section shows, these airmasses and coeffi- 
cients may have no relation to actual airmass and colour. 



3 GENERALIZATION 

The algorithm presented in the previous section is in fact 
a search to find the best two sets of {ci ;i = 1,N} and 
{flj ;j — 1,M} that minimize the global expression 
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Therefo re, although the altern ating 'criss-cross' iteration 
process iGabriel fc Zami'Jif 9791) started with the actual air- 
masses of the different images, the values of the final set of 
parameters {(ij} and {ci} are not necessarily related to the 
true airmass and extinction coefficient. They are merely the 
variables by which the global sum of residuals, 5*^, varies 
linearly most significantly. They could represent any strong 
systematic effect that might be associated, for example, with 
time, temperature or position on the CCD. This algorithm 
finds the systematic effect as long as the global minimum of 
is achieved. 

Now, suppose the data includes a few different system- 
atic effects, with different {d} and {aj}. We can easily gen- 
eralize the algorithm to treat such a case. To do that we de- 
note by {^^'ci} and {^^'flj} the first set of parameters found 
in the data. We then remove this effect and denote the new 
residuals by 
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We can then proceed and search for the next linear ef- 
fect, hidden in {'^Vij}. We use the same procedure to find 



Note that the derivation of each c,; is independent of all the 
other Ci's, but does depend on all the {aj}. 

The problem can now be turned around. Since atmo- 
spheric extinction might depend not only on the airmass 
but also on weather conditions, we can ask ourselves what is 
the most suitable "airmass" of each image, given the known 
coefficient of every star. Thus we can look for the aj that 
minimizes 
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(j'ij Ciaj^ 



(3) 



given the previously calculated set of {ci}. The value of the 
effective 'airmass' is then: 
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s' = 
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This process can be applied repeatedly, until it finds no sig- 
nificant linear effects in the residuals. The algorithm finds 
any linear systematic effect that can be presented as CiUj for 
the j-th measurement of the i-th star. 

After developing our algorithm we fo und that such an 
appro ach had already been proposed by iGabriel fc Zamiil 
il979h as a lower-rank approximation to data matrices. They 
applied the algorithm to data from other disciplines, like 
climate statistics and chemometrics, and discussed its con- 
vergence properties. Very similar algorithms were developed 
and applied for signal and image processing (e.g.. ILu et alJ 
Il997fl . If all measurements have the same uncertainties, the 
algorithm will reduce to the conventi onal PCA that ca n be 
applied through the SVD technique jPress et alJll993 . Ch. 
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2). However, when the uncertainties of the measurements are 
substantially different, PCA becomes less effective at finding 
and removing systematic effects. This can lead to removal of 
true variability, and can leave some actual systematic effects 
in the data. 



4 SIMULATION 

To demonstrate the power of the algorithm we present here 
one out of the many simulations we ran. In this specific 
example we simulated lightcurves of 3000 stars in f 000 im- 
ages. All stars were set to have constant magnitudes with 
normally distributed noise of various amplitudes. We added 
three systematic effects that depended on airmass, the CCD 
position and lunar phase. Finally, we added transit-like 
lightcurves to three stars. 

To simulate a realistic set of lightcurves, we assigned 
different noise levels to different stars, as if we had bright 
(high S/N) and faint (low S/N) objects. The r.m.s. ranges 
between 0.01 and 1 mag, with an average value of 0.3 mag. 
We assigned to each measurement an uncertainty which de- 
pended on the stellar standard deviation. To avoid an un- 
realistic case where all measurements of a star have the 
same uncertainty, we randomly varied the uncertainty of 
each measurement by 10 percent of the stellar r.m.s. 

We selected three light curves with 0.01 mag r.m.s and 
added to them planetary transit-like signals. The transit pe- 
riods were 2.7183, 3.1415 and 1.4142 days, with depths of 
15, 20 and 25 mmag, respectively. 

We added three systematic effects that depended on air- 
mass (linear and quadratic effects) , on the CCD X-position 
and on the lunar phase. Thus, 

rij = Ci(am)j+di(am)^+Xibj+fiSm{LLiiii_nartj)+Poisson noise 
where: 

• The observation times {tj} were set to the times of the 
first 1000 images of the OGLE Carina field survey, available 
from the OGLE website^ . 

• The airmasses {{am)j} were calculated using these 
times and the OGLE Carina survey parameters. 

• The positions on the CCD {xi} were randomly drawn 
from a uniform distribution, between and 2047. 

• The coefficients {ci} {di}, {bj} and {fi} were randomly 
drawn from normal distributions of zero mean. The standard 
deviations of these distributions were chosen so that the four 
systematic effects produced r.m.s. variability of 0.06, 0.04, 
0.01 and 0.008, respectively. 

We applied our algorithm to the simulated artificial sur- 
vey four successive times, to eliminate four different linear 
effects (see section|K|for a discussion of the number of effects 
to subtract). For comparison, we applied the PCA subtrac- 
tion to the same data set the same number of times. 

The efficacy of the algorithm is demonstrated in two fig- 
ures. Fig. presents randomly selected 2000 measurements, 
before and after the two algorithms were applied. Panel (A) 
shows the difference between the magnitudes before and af- 
ter the systematic effects were added. This difference, which 

^ http:/ /siruis. astrouw.edu.pl/~ogle 



is actually the exact amount added by the systematic effects, 
is plotted as a function of the r.m.s. of each star. We see that 
the typical systematic error is about 0.1-0.2 mag. For the 
"faint" stars, with r.m.s. of about 0.4-0.7 mag, the system- 
atic error is relatively small. However, for the "bright" stars 
in the sample, with inherent r.m.s. smaller than, say, 0.05 
mag, the additional systematic error is relatively large. This 
added noise can seriously hamper the ability to detect small 
effects like planetary transits. 

Panel (B) shows the systematic effects left after PCA 
was applied four times, again as a function of the stellar 
r.m.s. Had PCA worked perfectly, all differences would have 
been nullified, and all points in the diagram would have been 
concentrated on the horizontal line that goes through zero. 
We can see that PCA managed to correct all the systematic 
effects larger than 0.05 mag, but failed to perform for smaller 
systematic errors. 

Panel (C) of Fig. presents the same 2000 measure- 
ments, this time after applying our algorithm four times. 
We see that the ability of the algorithm to remove the sys- 
tematic error depends strongly on the stellar inherent r.m.s. 
The algorithm performs substantially better when the stel- 
lar r.m.s. is small. For those stars, the advantage of our 
algorithm over the PCA approach seems clear. In fact, all 
natural candidates for transit detection are exactly those 
stars. 

The ability to detect transits is depicted in Fig.|5| where 
we focus on the three stars with simulated transits. Each 
column presents the stellar folded lightcurve before and af- 
ter the successive iterations were applied. The data were 
folded with the transit period, and were plotted around the 
mid-transit phase. While initially the systematic errors com- 
pletely masked the transits, the three of them gradually sur- 
faced as more iterations were applied. 
l+Transit] (8) 

5 DISCUSSION 

The proposed algorithm reduces to the PCA approach for 
the case of equal uncertainties. It is therefore suggestive to 
explore the features of our algorithm by analogy with the 
corresponding features of PCA. 

In case of equal uncertainties the vectors {uj ;j = 
1, Af} are the eigenvectors of the covariance matrix R, 
where R is the measurement matrix {rij ;j = l,...,N,j = 
l,...,Af}. Since R^ R is symmetric, {aj ;j = 1, ...M} con- 
stitute an orthogonal set of vectors. The first few aj's are 
therefore an orthogonal base that spans the vector subspace 
of the significant systematic effects. Thus, it may very well 
happen that the strongest effect PCA derives is a linear com- 
bination of some effects we know from prior physical insight, 
like a certain combination of the airmass and the X posi- 
tion on the CCD chip. Conversely, it may so happen that 
two effects about which we have some insight, such as the Y 
position and the lunar phase, span a vector subspace which 
includes much of the power of a third effect, say the air- 
mass. In this case, PCA derives only two significant effects, 
contrary to our prior intuition. 

We suggest that our algorithm exhibits similar be- 
haviour. It is true that in the general case of unequal uncer- 
tainties the orthogonality of the {aj} is not guaranteed, but 
the same qualitative behaviour probably persists. 
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Figure 1. The differences between the derived measurements and their original values (without the systematic effects). Panel (a) 
presents the measurements before any correction was applied, (b) after applying PCA (c) and after applying the proposed algorithm (c). 
Data is plotted as a function of the stellar r.m.s. 2000 randomly chosen measurements are plotted. 



The recent large photometric surveys and the planned 
photometric space missions (e.g., CoRoT, Kepler) will face 
not only the problem of systematic effects, but also the prob- 
lem of long-term stellar variability. It turns out that the 
proposed solution can potentially remove some of this vari- 
ability. In this case, the various {aj} would assume the val- 
ues of some function, /, of the timing of the j-ih image: 
aj = f{tj). For equal uncertainties, the space of possible 
time variability can be spanned by an orthogonal basis of 
functions (e.g., trigonometric functions in the case of evenly 
spaced time sampling). From the PCA point of view, these 
basis functions, (like = coa{uj(^k)tj) ',k = ki,k2...}) 

can be thought of as systematic effects. The contribution 
of each basis function to the individual stars is reflected 
through the stellar coefficients, ''''ci. Removing those effects 
amounts to removing part of the power of the long-term vari- 
ability. Once again, the general case probably shows similar 
behaviour. 

In general, the main use of PCA has been to reduce the 
dimensionality of the data by finding only the significant fac- 
tors. Thus, an important question in PCA (MurtaEh & Hec3 
Il987l. ch. 2) is the number of significant factors to retain. 
In PCA, it is easy to solve for the complete set of effects 
(all eigenvectors of the covariance matrix), and then decide 
about the significant factors. In the general case of unequal 
uncertainties, we can proceed in two alternative ways. One 
way is to solve simultaneously for an assumed number of 
effects. The other alternative is to solve for the effects in se- 
quential stages. In each stage we subtract the effects found 
in previous stages before solving for a new effect. The two 



alternatives are equivalent in the PCA case (equal uncer- 
tainties), but in the general case they lead to different solu- 
tions. Moreover, subtracting the effect in each stage opens 
up the possibility of subtracting the effect not globally, but 
only from the stars which are most affected by it. We plan 
to further explore these issues in order to gain more insight 
into the features of the solution. 

We are currently applying the algorithm presented here 
to parts of the OGLE III data set. We have already found 
a few intriguing new planetary transit candidates, and we 
are still evaluating the statistical significance of these find- 
ings. It would be of great interest to apply our algorithm 
to space mission data, like HST photometry, to find out 
how large the systematic effects hidden in the data are. As 
we have demonstrated, the advantages of our algorithm are 
most pronounced in a data set of high S/N measurements 
with substantially varying uncertainties. Data from space 
missions exactly fit this description. 
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Figure 2. Folded lightcurves of the three transits planted in the data before and after the few first iterations. Each column depicts the 
lightcurves of one transit. The top row shows the data uncorrected, while the following rows show it after successive applications of the 
algorithm. The lightcurves are folded on the transit periods and show the points which lay within 0.1 phases of the middle of the transit. 
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